Properties of Stochastic Perceptual Auditory-event-based Models for Automatic Speech Recognition
نویسنده
چکیده
Recently, physiological and psychoacoustic studies have uncovered new evidence supporting the idea that human auditory processes focus on the transitions between spoken sounds rather than on the steady-state portions of spoken sounds for speech recognition. Stochastic Perceptual Auditory-event-based Models (SPAMs) were developed by Morgan, Bourlard, Hermansky and Greenberg to take this new evidence into account for word models in speech recognition by machines. This paper details our efforts to build a speech recognition system based on some of the properties of SPAMs. Although not all aspects of the complete SPAM theory have been implemented, we did find that fairly good recognition is possible with a system that concentrates almost exclusively on the transitions between speech sounds. Additionally, we found that such a system enhanced the more conventional phoneme-based system, which emphasized recognition of steady-state sounds. This blended system performed better than either system alone, especially in the case of noise-obscured speech.
منابع مشابه
Stochastic perceptual auditory-event-based models for speech recognition
We have developed a statistical model of speech that incorporates certain temporal properties of human speech perception. The primary goal of this work is to avoid a number of current constraining assumptions for statistical speech recognition systems, particularly the model of speech as a sequence of stationary segments consisting of uncorrelated acoustic vectors. A focus on perceptual models ...
متن کاملمدل میکروسکوپی دوگوشی مبتنی بر فیلتر بانک مدولاسیون برای پیش گویی قابلیت فهم گفتار در افراد دارای شنوایی عادی
In this study, a binaural microscopic model for the prediction of speech intelligibility based on the modulation filter bank is introduced. So far, the spectral criteria such as the STI and SII or other analytical methods have been used in the binaural models to determine the binaural intelligibility. In the proposed model, unlike all models of binaural intelligibility prediction, an automatic ...
متن کاملModeling of three types of auditory nerve and its application in speech recognition
A novel auditory nerve model is described here which simulates the three types of auditory nerves existing in the auditory system. The inspiration of the model is the absence of the simulation of the different types of auditory nerves in current auditory models. Based on the previous work, three sub-models replace the prevailing single auditory nerve discharge model in the common peripheral aud...
متن کاملTransfer from action to perception: The effect of motor-perceptual enrichment
This study investigated the effect of audiovisual integration on action-perception transfer.40 subjects were randomly divided four groups: visual, visual-auditory, control visual and control visual-auditory. Visual groups watched pattern skilled basketball player and other groups in addition to watching pattern skilled basketball player, heard Elbow angular velocity as sonification. In first st...
متن کاملApplying physiologically-motivated models of auditory processing to automatic speech recognition
For many years the human auditory system has been an inspiration for developers of automatic speech recognition systems because of its ability to interpret speech accurately in a wide variety of difficult acoustical environments. This paper discusses the application of physiologically-motivated approaches to signal processing that facilitate robust automatic speech recognition in environments w...
متن کامل